Referential equations for pulmonary diffusing capacity using GAMLSS models derived from Japanese individuals with near-normal lung function

Objective To generate appropriate reference values for the single-breath diffusing capacity of the lungs for carbon monoxide (DLCO), alveolar volume (VA), and the transfer coefficient of the lungs for carbon monoxide (KCO, often denoted as DLCO/VA) in the Japanese population. We also intended to assess the applicability of these values for the Japanese population by comparing them to those published by the Global Lung Function Initiative in 2017 (GLI-2017) and previous values. Methods In this retrospective study, we measured the spirometric indices, DLCO, VA, and KCO of the Japanese population aged 16–85 years. The lambda, mu, and sigma (LMS) method and the generalized additive models for the location, scale, and shape program in R were used to generate the reference values. Results We conducted a total of 390 tests. The GLI-2017 z-scores of DLCO were approximately zero, whereas those of KCO and VA were far from zero. In the present study, the mean square errors of the DLCO, VA, and KCO reference values were lower than the reference values derived from GLI-2017 and previous linear regression equations. Conclusions Reference values obtained in this study were more appropriate for our sample than those reported in GLI-2017. Differences between the two equations were attributed to underestimating KCO (DLCO / VA) and overestimating VA, respectively, by the GLI-2017 for the Japanese population.


Introduction
Single-breath diffusing capacity of the lungs for carbon monoxide (D LCO ) is a simple noninvasive method for diagnosing and monitoring patients with chronic lung diseases, such as chronic obstructive pulmonary disease (COPD) or interstitial lung disease (ILD) [1]. D LCO is a commonly used indicator for the early detection and monitoring of chronic lung diseases. However, there are no standardized reference values for the D LCO , alveolar volume (V A ), and the transfer coefficient of the lung for carbon monoxide (K CO , often denoted as D LCO /V A ) in the Japanese population.
In 2017, the Global Lung Function Initiative (GLI) published new D LCO reference values for Caucasians aged 5-85 years (GLI-2017) [2]. Three retrospective studies have assessed the GLI-2017 reference values in various population sets of both healthy controls and patients [3][4][5]. The GLI-2017 reference values were based on the lambda, mu, and sigma (LMS) method, which used the generalized additive models of the location, shape, and scale (GAMLSS) package in the statistical program R [6]. In 2020, the GLI updated reference values for lung function tests (LFTs) in individuals of European ancestry using the LMS method of GAMLSS [7]. The GAMLSS modeling approach is suitable for deriving reference values for lung function outcomes [6,8,9]. Despite no standardized reference values for the D LCO in the Japanese population derived from GAMLSS modeling, researchers have reported on values for LFTs using GAMLSS modeling [10].
The European Respiratory Society (ERS) and the American Thoracic Society (ATS) have updated their standards for measuring carbon monoxide gas transfer in the lungs, and additional guidelines for the technique are available [11,12]. However, there is no agreement on the best equations for various ethnic groups.
We aimed to develop GAMLSS models using collated contemporary D LCO (T LCO ) data from Japanese patients without chronic lung diseases, such as COPD or ILD, which reduces diffusivity, and derive the reference values for D LCO measurements. Next, we intended to examine if our predicted values differed less from those obtained from the frequently used GLI-2017 or linear prediction equations.

Study participants
This retrospective observational study was approved by the Ethics Committee of the Shinshu University (permission number 5139) and was performed in accordance with the principles outlined in the tenets of the Declaration of Helsinki of the World Medical Association. The requirement for written informed consent was waived owing to the use of de-identified retrospective data. Contrarily, this research used an opt-out consent model, which allowed the participants to withdraw their consent at any time and have their information deleted from the registry.
The inclusion criteria were as follows: (1) Japanese patients without chronic lung diseases, such as COPD or ILD, and derived reference values for D LCO (T LCO ) measurements at the first medical consultation in our institute (Shinshu University Hospital, Matsumoto, Japan) from January 2008 to November 2021, (2) never smoker, (3) age 16-85 years, (4) body mass index (BMI) <30 kg/m 2 , (5) no abnormality or localized shadow based on chest computed tomography (CT) performed within 6 months before the LFT, (6) the percent D LCO >80% according to the prediction equations of Nishida et al. and Burrows et al.,and (7) ambulant patients [13,14]. To ensure a sufficient sample size, patients with early-stage lung cancer, sarcoidosis, or asthma, with small abnormal shadows that did not meet the exclusion criteria or without abnormal shadows were included in the CT screening. It was difficult to include all participants with abnormalities in the CT screening considering the study's retrospective design. The exclusion criteria were as follows: (1) cardiovascular disease other than hypertension, (2) motor neuron disease, (3) chest wall disorder, (4) severe renal or liver dysfunction, (5) dementia and psychic disorder, (6) anemia, (7) severe renal or liver dysfunction, (8) abnormal shadows with a maximum diameter >50 mm on chest CT, and (9) other diseases potentially affecting respiratory function. Particularly, ILD and COPD were excluded from the study by imaging tests and LFTs.

Lung function tests
All patients underwent LFTs, including spirometry, D LCO (T LCO ), V A , residual volume, and total lung capacity, using a pulmonary function testing system (CHESTAC-8900 1 ; CHEST Co., Ltd., Tokyo, Japan). Our hospital is sited 621 meters above sea level. The D LCO , K CO , and V A were measured by the single-breath method according to ERS and ATS standards to measure carbon monoxide gas transfer [11,12]. The anatomical dead space was fixed at 150 ml to obtain reference values. We used V A reported in L (standard temperature and pressure, dry; STPD conditions) to obtain D LCO .
In terms of diffusing capacity of the lungs for carbon monoxide notation, we have referred to the diffusivity of the traditional unit (ml / min / mmHg) as D LCO and of the SI unit system (mmol / min / kPa) as T LCO .

Pulmonary function test equations for D LCO measurements
We used the Nishida's and Burrows' equations for determining the percent predicted D LCO (T LCO ) and K CO (D LCO /V A ), which are often used in daily clinical practice in Japan [13,14]. We confirmed a normal percent predicted D LCO using previous linear equations because the predicted D LCO values obtained using each predictive linear model equation differ significantly [15].

Statistical analysis
The values provided in the tables represent the mean ± standard deviation. All DLCO, VA, and KCO (DLCO/VA) data were converted to z-scores according to the GLI-2017 equations, assuming the GLI-2017 equations for Caucasians were applicable to Japanese. If the GLI-2017 equations provided a good fit, the derived DLCO z-scores were expected to be symmetric around zero [3,16]. After generating GAMLSS modeling equations derived from the present data, we converted the data for the model assessment group to z-scores according to the present equations. Subsequently, we compared the current z-scores to those calculated by GLI-2017 using a one-sample t-test in the model assessment groups. The resulting z-scores had a mean of zero and a standard deviation of one, indicating that the data was reasonably well fitted if it was close to zero [16].
We developed separate prediction equations for the D LCO (T LCO ), V A , and K CO (D LCO /V A ), including age and height as potential predictors for men and women. We considered the following modeling strategies while developing the prediction equations: GAMLSS considers numerous residual distributions and provides several link functions between the predictors and outcomes, as well as the ability to integrate each moment's parameter predictors (including the median, variability, skewness, and kurtosis) [2,[17][18][19][20]. The GAMLSS includes the LMS method for establishing reference equations, which can be used to define the Box-Cox-Cole-Green (BCCG) residual distribution in the R package "GAMLSS." The BCCG is based on Cole and Green's pioneering work in fitting a single smoothing term to each of the three distribution parameters [17]. The normal, BCCG, and Box-Cox-power-exponential (BCPE) distributions were all considered during the GAMLSS model development process [6,9,18]. We analyzed the log and identified link functions to determine the need for a predictor to model each moment parameter (median, the coefficient of variation, and skewness v), and its inclusion in the original or log style. To model the median μ (M moment of LMS), we considered the height and age as candidate predictors. We considered age as a candidate predictor while modeling variability (S moment of LMS) and skewness v (L moment of LMS).
The GAMLSS model with the lowest Bayesian information criterion (BIC) was selected as the best model. Given the importance of evaluating predictive performance, we chose 4/5 of the individuals to build the model and 1/5 to assess it. Computing the BIC values for the GLI-2017 prediction equations and previous linear prediction equations were practically impossible; therefore, we compared the performance of the "best" GAMLSS models and the GLI-2017 prediction equations for Japanese using mean squared errors (MSEs) [2,[17][18][19][20].

Study population
The study cohort comprised 390 Japanese patients (193 men and 197 women) aged 16 to 85 years, with a maximum BMI <30 kg/m 2 ( Table 1). Males involved in model assessment were older and had lower vital capacity (VC), forced vital capacity (FVC), and forced expiratory volume in 1 s (FEV 1 ) levels than those involved in model building. Females involved in model assessment were older than those involved in model building, but there was no significant difference in each index of pulmonary function test. In order to evaluate the current study equations, we planned to randomly assign 1/5 of the patients to the model assessment group. Finally, 39 male individuals were evaluated as models (20.2% of the male population). Thirtyseven female individuals were involved in model evaluation (18.8% of the total female population). Table 2 summarizes the age distribution of model assessment participants, including the mean and 95% confidence intervals for D LCO and D LCO /V A by age decade. The 95% confidence interval for D LCO was wider in the younger age group, as shown in Table 2, indicating more variation in D LCO than in the elderly. D LCO /V A was higher in younger age groups and decreased in older age groups, whereas VA was less variable with age and increased with height. Conversely, it can be seen from   (Fig 1). We tested the z-scores derived from the current and GLI-2017 equations using a one-sample t-test to examine the adequacy of applying the GLI-2017 equations to our data (Table 3). In the model assessment groups, almost all tests resulted in pvalues <10 −2 , except for the D LCO (T LCO ) for men, thus indicating a disagreement between our observed data and the GLI-2017 predicted values. This suggested that the GLI-2017 prediction equations were inappropriate for the Japanese population.  Abbreviations: D LCO , single breath diffusing capacity for carbon monoxide; K CO , single breath diffusing capacity for carbon monoxide per unit of lung volume; and V A , alveolar volume. Table 4 summarizes the best GAMLSS models of the D LCO (T LCO ) and V A in men and women, respectively. The height and age were independent predictors of each M (μ), which required a natural logarithmic transformation of the height and a spline function for age, consistent with the GLI-2017 equations. We selected the BCCG distribution over the normal and BCPE distributions to model all prediction equations. The GLI-2017 reference value for the D LCO was lower than our current values in women but was consistent with our values in men (Fig 2A). In both men and women, the GLI-2017 reference values for the K CO (D LCO /V A ) were lower than our current values for all age decades (Fig 2B). However, the GLI-2017 reference value for V A was higher than our calculated value for all age decades (Fig 2C). Fig 3 depicts the relationships between the present study and GLI-2017 predicted values and LLN in the Japanese population (n = 390) for participants aged 60 years and of different heights. In contrast to the D LCO -age relationship, the GLI-2017 reference value for the D LCO in men was greater than our current values but was consistent with our current values in women, both aged >60 years with different heights (Fig 3A). In both men and women, the GLI-2017 reference value for the K CO (D LCO /V A ) was lower than our current values across all heights (Fig 3B), whereas that for V A was greater across all heights (Fig 3C).  (Fig 4). The current KCO (D LCO /V A ) reference values were higher than those of previous equations, whereas the current V A reference values were lower than GLI-2017 in both men and women (Fig 4). The S1 File illustrates the differences between our current mean reference values and those from the GLI-2017, Nishida et al., and Burrows et al. (S1 Fig in S1 File). Subsequently, we compared the predictive performance between our study and previous equations in terms of MSEs (Table 5). We observed smaller MSEs for the D LCO , K CO (D LCO / V A ), and V A than those from the GLI-2017 equation, which suggested better predictive results from our current equations to the Japanese population set.

Discussion
This is the first study to model the D LCO using the GAMLSS approach in patients with nearnormal lung function and to assess the applicability of the GLI-2017 prediction equation for a Japanese patient cohort. First, the GLI-2017 prediction equation for Caucasian patients did not match the D LCO , K CO (D LCO /V A ), and V A data for the contemporary Japanese patient population, thereby highlighting the importance of developing prediction equations for the Japanese population. Second, we established prediction equations for the D LCO , K CO (D LCO /V A ), and V A in a Japanese population aged 16-85 years using the GAMLSS model. Third, the GAMLSS model outperformed GLI-2017 equations and previous linear regression equations for the Japanese population.
Upon applying the GLI-2017 prediction equation for Caucasians to our study cohort (Fig  1), the z-scores of the D LCO (T LCO ) were relatively nearer zero, particularly in men. However, the z-scores of the K CO (D LCO /V A ) were significantly higher (1.51 ± 0.090) in men and women, whereas those for V A were significantly lower (-1.566 ± 0.075). Thus, the GLI-2017   prediction equation tended to underestimate the K CO (D LCO /V A ) and overestimate the V A , thus resulting in a relatively accurate estimate of DLCO in the Japanese population (Figs 2 and  3). Our results could be explained by the degree of ethnic heterogeneity in the GLI-2017 prediction equations for the Japanese population. These equations were derived from several Caucasian ethnicities [2,3]. This observation holds true for the DLCO and the KCO and VA, which must be determined to ensure correct prediction.
The majority of studies in Asian populations and parts of Caucasian populations, including Japan, have used linear regression models to generate prediction equations for the D LCO , despite the linearity assumption not always holding true for the relationships between the age, height, and the D LCO [21][22][23]. Our findings added to the growing body of evidence that the D LCO indices decrease nonlinearly with increasing age in ranging from 16 years to 85 years. The GAMLSS method addressed the previously mentioned issue by improving its ability to account for non-linear relationships between the age, height, and D LCO indices. Our results indicated that accounting for age and height as potential predictors in D LCO models was consistent with previously reported GAMLSS prediction models in the Caucasian population. For example, Verbanck et al. demonstrated that the T LCO (D LCO ) decreases monotonically across the entire age range of a healthy Caucasian population, with the variability remaining nearly constant between 20 and 80 years, close to the age of our study population [24].
According to the ATS and ERS, several factors affect pulmonary function, including age, sex, height, weight, and ethnic origin [2,7]. In our Japanese population, we developed prediction equations for the D LCO indices and the corresponding LLN. We presented a calculator in which the users could enter their age and desired height and immediately obtain the corresponding predictive values, z-scores, and LLN for the D LCO (T LCO ), K CO (D LCO /V A ), and V A , respectively (S1 Data).
This study had several strengths. First, we established reference values for the D LCO through an analysis using the GAMLSS method on Japanese participants aged 16-85 years. Second, the secular trends in pulmonary function characteristics warrant periodically updated reference values for normal and abnormal classifications to reflect contemporary population realities. Our prediction equations were based on a representative sample of contemporary Japanese patients with near-normal lung function. The GAMLSS reference values for D LCO in the Japanese population were unavailable prior to the current study.
However, our study had some limitations. We compared our reference values to the GLI-2017 values for the Caucasian population owing to the absence of a GAMLSS equation for the D LCO in the Japanese population. Second, this retrospective observational design made it impossible to identify all patients with normal CT screening results. Moreover, we could not recruit healthy volunteers because the collection of raw LFT data is strictly regulated for Japanese patients. Third, we collected data from a single laboratory, whereas the sample size used to build the model was comparable to that used in previous reports [18,21]. There is a possibility that factors such as the small sample size, single-center measurements, the altitude of our hospital, and method of determining anatomical dead space contributed to discrepancies between our predictions and those of other researchers. As mentioned in the method, to ensure a sufficient sample size, patients with early-stage lung cancer, sarcoidosis, or asthma, with small abnormal shadows that did not meet the exclusion criteria, or without abnormal shadows were included in the CT screening. The fact that lung function data of these patients were used to create and validate the predictive formula may have caused our predictive formula to differ from other predictive formulas.
In conclusion, our current reference values based on the Japanese population were more appropriate for our sample than the GLI-2017 values, and differences between the two equations are attributed to the underestimation of the K CO (D LCO /V A ) and the overestimation of V A . Our study examined the effect of future application of new reference values based on the GAMLSS model equations for the assessment of D LCO in the Japanese population.